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Abstract 

We investigate the Moreau-Yosida regularization and the associated proximal map in the 
context of discrete gradient flow for the 2-Wasserstein metric. Our main results are a stepwise 
contraction property for the proximal map and an "above the tangent line" inequality for the 
regularization. Using the latter, we prove a Talagrand inequality and an HWI inequality for 
^ C ^ the regularization, under appropriate hypotheses. In the final section, the results are applied 

to study the discrete gradient flow for Renyi entropies. As Otto showed, the gradient flow for 
these entropies in the 2-Wasserstein metric is a porous medium flow or a fast diffusion flow, 
depending on the exponent of the entropy. We show that a striking number of the remarkable 
features of the porous medium and fast diffusion flows are present in the discrete gradient flow 
and do not simply emerge in the limit as the time-step goes to zero. 
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1 Introduction 



^ Given a complete metric space {X, d), a functional E : X MU{oo}, and r > 0, the Moreau-Yosida 

regularization of E is 

Eriy) := ml{^dix,y)^ + Eix)]. 
xex \^2t ) 

The corresponding proximal set J-r X ^ 2-^ is 

Jriy) '■= argmin I —d{x, y)^ + E{x 



2r 



If there is a unique element in Jr{y), we denote it by y^- and call it the proximal point. We call 
y ^ yr the proximal map. 
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When X = 7^ is a Hilbert space, a suitable context in which to develop the theory of the 
Moreau-Yosida regularization is the class of functionals that are proper, lower semicontinuous, and 
convex. For all such E and r > 0, the Moreau-Yosida regularization Er is convex and Frechet 
differentiable [16]. Furthermore, its derivative is Lipschitz continuous, and, as t ^ 0, E^ E 
pointwise [6]. The Moreau-Yosida regularization provides a way to regularize E that preserves 
convexity. 

The proximal map is similarly well-behaved for functionals that are proper, lower semicontin- 
uous, and convex. For each y ^ % and r > 0, there is a unique proximal point yr, so that the 
proximal map y yr is well-defined on all of T-L. As shown by Moreau [16], the proximal map is a 
contraction in the Hilbert space norm: 

||a:^T — yrll < Ik ~ y|| ^27, y £ 7i. 

One of the main reasons for interest in the Moreau-Yosida regularization and proximal map is 
their relation to gradient flow. The gradient flow of a functional E is the Cauchy problem 



^^y(t) = -VE{y{t)), y{0)eD{E) = {zen:E{z)<oc}, (1.1) 

which is well-defined as long as VE exists along the fiow The Moreau-Yosida regularization 

plays a key role in the proof of existence for solutions to the gradient fiow |3]. First, one uses the 
additional regularity of Er to find solutions to the related gradient flow problem 

d 



^^yr{t) = -VEriyrit)), yr(0) G D{E). 



Then, as r — )• 0, the curves yr(i) converge to a curve y{t) that solves (1.1) in an appropriate sense. 

The proximal map expresses the discrete dynamics of gradient flow. Speciflcally, one may use 
the proximal map to define the discrete gradient flow sequence 



yn = (yn-Or, VO S D{E), 

as in [12[|13| . Whenever the proximal map y i— )• y,- is well-defined, we may identify the proximal set 
Jriy) with its unique element yr and write J" to indicate n repeated applications of the proximal 
map. The exponential formula quantifies the sense in which the discrete gradient fiow is a discretized 
version of gradient fiow [6j. If y{t) is a gradient flow with initial conditions y(0), then 

y{t) = lim (Ji/J"(y(0)). (1.2) 

More recently, the Moreau-Yosida regularization and proximal map have been applied outside of 
the Hilbert space context to gradient flow in the 2-Wasserstein metric. Briefly, we recall some facts 
about this metric, mainly to establish our notation — see [2] and [22] for more background. We 
present these facts both in the most general setting, without restrictions on the type of probability 
measures we consider, and in a simpler setting, focusing our attention on probability measures with 
flnite second moment that are absolutely continuous with respect to Lebesgue measure. While our 
results hold in the most general setting, many interesting applications concern only the simpler 
setting, in which the exposition and notation is more straightforward. 



^Alternatively, one may define the gradient flow in terms of the subdifferential 
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Let P(M'^) denote the set of Borel probability measures on M^. Given £ P(M'^), a Borel 
map T ■.R'^ ^R'^ transports /i onto v if v(B) = fi{T-^{B)) for all Borel sets B C R'^. We call u 
the push-forward of under T and write = Tfj^^. 

Now consider a measure /x E P(M'^ x M'^). (We will distinguish probability measures on M'^ x M'^, 
from probability measures on R'^ by writing them in bold font.) Let tti be the projection onto the 
first component of M'^ x M*^, and let 1^2 be the projection onto the second component. The first and 
second marginals of /i are tti 

Given /i, € P(M'^), the set of transport plans from /i to 1/ is 

r(/x, z^) := {n £ V(R'^ X M'^) : 7ri#/x = /x , 7r2#/x = i/} . 
The 2-Wasserstein distance between fi and v is 

T^2(Ai,i^) := finfj / |x - 2/pd/x(x, y) : /i G r(/i, i.))^ . (L3) 

When W2{fJ',i^) < 00, this infimum is attained, and we refer to the plans that attain the infimum 
as optimal transport plans. We denote the set of optimal transport plans by ro(/i, z^). 

The 2-Wasserstein distance satisfies the triangle inequality and is non-negative, non-degenerate, 
and symmetric. However, V{R'^) endowed with the 2-Wasserstein distance is not a metric space, 
since there exist measures that are infinite distances apart. Let Vfj,g{R'^) be the subset of V{R'^) 
consisting of measures that are a finite distance from some fixed Borel probability measure fj,Q, so 
that, by the triangle inequality, (P^(,(M'^), W2) is a metric space. As indicated by the notation, one 
may take /xq to be the initial conditions of a gradient flow. Note that when fiQ = 60, the Dirac mass 
at the origin, Vs(,{R'^) is the subset of V{R'^) with finite second moment. 

We now define the 2-Wasserstein distance in a simpler setting. Let ^2(1^'^) denote the set of 
probability measures with finite second moment and V2{R'^) denote the set of probability measures 
with finite second moment that are absolutely continuous with respect to Lebesgue measure. If 
fi G P2{R'^) and v G 7^2(1^'^)) the 2-Wasserstein distance between n and u reduces to the form 

W2{f^, u) := {ini |y k - T{x)\^dii{x) : T#fi = ^}) ^ • (1-4) 



The Brenier-McCann theorem guarantees that the infimum in (1.4) is attained by T = V(p, where 
: M"' — )• M is convex and V(/? is unique /^-almost everywhere [15j . In particular. 



W^{^l,u) = J \x-Vipix)\^dfi{x) , 

and we call V(p the optimal transport map from /x to v. To emphasize its dependence on ^ and z/, 
we denote the optimal transport map from /i to by tj^. 

Given /xi,/X2 £ ViW^) with VF|(/xi,/i2) < 00 and /x G ro(^^,/i^), a geodesic connecting ^1 and 
/i2 £ V{R'^) is a curve of the form 

M^^' : [0, 1] ^ V{R'^), 1/^^ = ((1 - a)Tii + a7r2) #/i . 

As shown in [2, Theorem 7.2.2], this definition agrees with the metric space definition of a geodesic, 
i.e. a curve /Xq, : [0, 1] — )• V{R'^) with W2(/xo,/xi) < 00 such that W2{fia, fJ-is) = \a — /3| W2(jUo, /xi). 
If /xi G "Pf (M"^), fi2 G V2{R'^), then the geodesic connecting /xi and fi2 is unique and of the form 

/x^--^ , [0, 1] ^ p2(M"), i^i-^^ = ((1 - a)id + at>;^l) #/xi. 
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where id(x) = x is the identity transformation. 

A functional E : V^,o{R'^) MU {00} is A -convex in the 2-Wasserstein metric if, for all /ii,/i2 £ 
V^g{W^), there exists a geodesic connecting fii and /i2 along which E is A-convex: 

Eifii^^) < (1 - a)E{i^i) + aE{f,2) - a{l - a)^PF|(^i, /ia). (1-5) 

If a functional is 0-convex, we simply call it convex^ If a functional is 0-convex and strict inequality 
holds in (1.5) for all a G (0, 1), we call it strictly convex. 

Given a functional E : 'P^(,(M'^) — )• MU {00} and r > 0, its Moreau-Yosida regularization is 

Er{f^):= ini I^W^{fi,i.) + E{A (1.6) 

and the corresponding proximal set Jr : 'P^q(M^) — )• 2^''o(^'') is 

J,(m):= argmin | -Lv^K/i, i.) + S(z.)| . (1.7) 



As before, if there is a unique element in Jr(/i), we denote it by and call it the proximal point. 
Similarly, we call jj, ^ jj,r the proximal map. The properties of the Moreau-Yosida regularization 
and proximal map in the 2-Wasserstein metric will be the main focus of this paper. 

As in the Hilbertian case, one of the main reasons for interest in the Moreau-Yosida regulariza- 
tion and the proximal map in the 2-Wasserstein metric is their relation to gradient flow. When E 
and /i are sufficiently smooth, the 2-Wasserstein gradient of E at fi £ E>{E) is 

Vw.i?(/z) = -V-(^/iV^(^)), (1.8) 

where ^ is the functional derivative of E |19j [21 Chapters 8 and 10] |^ The gradient flow of E is 
the Cauchy problem 

d 



^^fi{t) = -VwE{fi{t)), fi{0) G D{E) = {// G P^,{Rd) : Eifi) < 00}, 

which is well-defined, as long as VwE{fi{t)) exists along the flow /u(t)|^ We will sometimes refer 
to this as the continuous gradient flow in order to distinguish it from the discrete gradient flow we 
define below. 

Otto observed that -V- (/uVf|(/i)^ may be viewed as the gradient vector field on the "Rieman- 

nian manifold of probability densities on W^" associated to the functional E, where the Riemannian 
metric is the infinitesimal form of the 2-Wasserstein metric [181 [19] . (It is one of his insights that 
the 2-Wasserstein metric is induced by a Riemannian metric.) In this metric, the length of the 
gradient of at /i is given by 

2 \ V2 



NwE{fi)\ 



V— /i 
6p 



d/x . (1.9) 



2 



It is also common to refer to convex functionals in the 2-Wasserstein metric as displacement convex [14] . 
^Some authors - e.g. Q\ - identify the tangent vector VwE[^) with the gradient vector field — V^(/j) on R**. 
One gets Otto's representative from this by multiplying by ^ and taking the divergence. The choice of representatives 
is merely notational. 

* Alternatively, one may define gradient flow in terms of the subdifferential [7,, Definition f f .f .f]. 
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As in the Hilbertian case, the proximal map expresses the dynamics for discrete gradient flow. 
When the proximal map ^ Hr is well-defined (which occurs under much weaker assumptions on 



E and /x than are needed to define the gradient, as we describe before equation (1.14) below) we 
may define the discrete gradient flow sequence 



= (/in-l)r, ^0 S D{E) . (1.10) 

As before, we identify the proximal set Jr(/u) with its unique element fir and write J" to indicate 
n repeated applications of the proximal map. 

One of the advantages of discrete gradient flow is that it is not necessary to make precise the 



sense in which (1.8) defines a gradient vector field. This fact was emphasized by De Giorgi in his 
theory of the metric derivative [8] and extensively developed by Ambrosio, Gigli, and Savare [21 
Chapter 8]. We follow De Giorgi's lead, and all of the estimates we use involve only the length 



of the gradient |Vvi/-£'(/^)|- In the case that E and fi lack sufficient smoothness for (1.9) to be 
well-defined, we will interpret the symbol \'VwE{fi)\ as the metric slope 

limsup ^ . 1.11 

We use the heuristic notation \ VwE{fi)\ since, as demonstrated by Otto [IHlIlQ], it is often enlight- 
ening to think of |Vvy-E(//)| as coming from a Riemannian metric on V{W^). 

In their recent book j2], Ambrosio, Gigli, and Savare conduct a detailed study of gradient flow 
and discrete gradient flow in the 2-Wasserstein metric for large classes of functionals, developing 
the analogy with the Hilbert space theory. It would be too much to hope for a perfect analogy. For 
example, in the Hilbert space context, if a functional E is proper, lower semicontinuous, and convex, 
then its Moreau-Yosida regularization E^- is also convex. However, in the 2-Wasserstein metric, it 
is well-known that even when E satisfies analogous assumptions, E^ is not always convex The 
key technical difference between the two metrics is that while 

x^\\\x-y\\^ (1.12) 



is 1-convex along geodesies, 



2' 



l^^\wi{fi,v) (1.13) 



is not A-convex along geodesies, for any A G M, if the dimension of the underlying space is greater 
than or equal to 2 [21 Example 9.1.5]. Since much of De Giorgi's "minimizing steps" approach to 



gradient flow relies on the 1-convexity of (1.12), this lack of convexity in the 2-Wasserstein case 
complicates the implementation of De Giorgi's scheme. 

Ambrosio, Gigli, and Savare circumvent this difficulty with their observation that, though /i i— t- 
^Wf (/^' ^) is not 1-convex along all geodesies, it is 1-convex along a different class of curves. They 
define the set of generalized geodesies to be the union of these classes of curves over all u G ViW^) 



(see Section 2.1 ). By considering functionals that are convex along generalized geodesies — a stronger 



condition than merely being convex along geodesies (see Section 2.2 ) — they deduce a priori estimates 



that provide detailed control over the gradient flow and discrete gradient flow. 



'For the reader's convenience, we include an example in Section [s] 
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The key results that we wih use concern functionals E : Vf.oi'^'^) K U {00} that are proper, 
coercive, lower semicontinuous, and A-convex along generalized geodesies (see Section [2.2[ )p| With 
these assumptions, Ambrosio, Gigli, and Savare show that if r > is small enough so that Ar > — 1, 
then for all /i G D{E) the proximal map 

/X H> (1-14) 

and the discrete gradient flow sequence 



= (/^n-l)r, S D{E), 



are well-defined. They go on to prove the 2-Wasserstein analogue of the exponential formula (1.2) 



relating the discrete gradient flow to the continuous gradient flow [2, Theorem 4.0.4]. If ^{t) is the 
solution to the continuous gradient flow of E with initial conditions /u(0) G D{E), then 

/i(t) = hm {Jt/nTim) ■ (1-15) 

Using the assumption of convexity along generalized geodesies, Ambrosio, Gigli, and Savare 
comprehensively develop the theory of continuous gradient flow. While this assumption is stronger 
than (standard) convexity along geodesies, it is not restrictive: all important examples of functionals 
that are convex along geodesies are also convex along generalized geodesies |2l Section 9.3]. 

In this paper, we take a closer look at the Moreau-Yosida regularization and the proximal map 
in the 2-Wasserstein metric for functionals that are convex along generalized geodesies. We show 
that, while the Moreau-Yosida regularization does not preserve £"s convexity along all geodesies 
(as in the Hilbertian case), if E attains its minimum at fl, the Moreau-Yosida regularization does 
satisfy an "above the tangent line" inequality at fl. This type of inequality is a necessary condition 
for convexity — in particular, a function from M to M is convex if and only if it lies above its tangent 
line at every point. 

1.1 THEOREM (Generalized convexity of Er). Given E : P^(,(M°') — ^ MU {00} proper, coercive, 
lower semicontinuous, and X-convex along generalized geodesies with A > 0, assume that E attains 
its minimum at Ji. For r > 0, define \r ■= j^xr' ^^^'^ f^''" ^ -^(-^); there exists a geodesic 
fJ'a'^'^ from fl to jjL such that 

Eril^'^-^n < {1 - a)Er{fi) + aEAp) - a{l - a)^Wi{fl, ^,). (1.16) 



In Section 4.1 we show that (1.16) is sharp by presenting an example in which E is A-eonvex and 
Et is no more than AT--convex 

As a consequence of Theorem |1.1[ we show Er satisfies a Talagrand inequality and an HWI 
inequality. 

1.2 THEOREM (Talagrand and HWI Inequalities). Under the assumptions of the Theorem \l.l\ 
for all fl G D{E), we have the Talagrand inequality 

Erifi)-Er{fi)>^Wi{f,,fl) (1.17) 



Note that Ambrosio, Gigli, and Savare often state their results in the context when jj,o = So, the Dirac mass at 
the origin, so Vi^oi^'') ~ 'P2(K'*)- We quote their results in broader generality, since the proofs are easily adapted to 
this case. 
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and the HWI inequality 



EAp)-Er{fi) < \VwEr{^i)\W2{^l,li) - ^w^ifiji) . 



(1.18) 



These inequalities capture E'^-'s behavior at /2 from both ends of the "above the tangent hne" 
inequahty. 

We also develop the analogy between Hilbertian metrics and the 2-Wasserstein metric by proving 
a contraction inequality for the proximal map. In a Hilbert space, if E is proper, lower semicon- 
tinuous, and convex, Moreau [T6] showed that the proximal map satisfies 



VtW < \\x - y\\ Vx,y G H. 



(1.19) 



This turns out to be a rather miraculous property of the Hilbertian norm that fails even in simple 
Banach spaces. For example, consider the norm on M?. Fix two points a = (0, 0) and b = (1, 1), 
and let K be the closed half-space lying beneath the line 3x2 = xi — 4. Let E be the indicator 
function for K, 

if X = (xi, X2) S K 
00 otherwise. 



E{x) :-- 



Then 



Jr{y) ■■= argmin <! -"-Hx - y||^ + E{x] 



argmm< ^^Wx-y^^^ 
xeK I 



Therefore, Jri^a) = (1, —1) and Jt(&) = (5/2, —1/2) for all r > 0. This is not a contraction since 
\\a - 6||oo = 1< 3/2 = \\Jr{a) - Jr{b)\\^. 




Figure 1: In the Banach space R , endowed with the £°° norm, the proximal map is not a contraction. 



The situation for general metric spaces is even more involved than the situation for metrics 
induced by norms, and one does not expect a contraction to hold. Nevertheless, if E is appropriately 



convex, the continuous time gradient flow defined by (1.15) is contractive [2, Theorem 4.0.4] [li)]. 



This gives hope that some contraction property of the proximal map is present at the discrete level 
and does not merely emerge in the limit. 



CC October 10, 2012 



8 



Our next result shows that this is the case. In particular, we achieve contraction of the proximal 
map by making a small modification to the squared distance: given r > 0, we consider the functional 
: V{R'^) X P(R'^) ^ R U {00} defined by 



Arifi,!^) := Wiiis,u) + —\VwE{fi)\^ + —\VwE{u) 



(1.20) 



As before, we interpret \\/wE{^)\ as the metric slope ( |1.11[ ) when E and /i lack sufficient smoothness 



for the norm of the 2-Wasserstein gradient (1.9) to be well-defined 



Though we state the following theorem in the context of the 2-Wasserstein metric, it continues 
to hold in a more abstract setting: given a functional on a complete metric space {X, d), HE 
is proper, coercive, lower semicontinuous, and satisfies [21 Assumption 4.0.1] for some A G M, then 
the result remains true by replacing W2 with d. 

1.3 THEOREM (Contraction of proximal map). Given E : V^oi^'^) ^ MU{oo} proper, coercive, 
lower semicontinuous, and X-convex along generalized geodesies, fix t > small enough so that 
Xt > -1. Consider € D{E) and let A^ : V{W^) x V{1 
if ^^0, the proximal map is contracting in Ar, 



^) MU{oo} be given by (1.20). Then. 



Ar {Hr,l'T) < Ar{n,y) 



(1.21) 



More generally, for X G 



Ar{Hr,l^T 



Xt 



(1.22) 



In Section 4.1 , we show that the inequality ( 1.22 ) is sharp. Then, in Section 4.2 we apply ( |1.21 ) 
together with scaling properties of the W2 metric to derive sharp polynomial rates of convergence to 
Barenblatt profiles for certain fast diffusion and porous medium equations. Otto originally deduced 
these results in [19| by considering a modified gradient flow problem for A-convex functionals with 



A > 0. The contraction inequality (1.21) provides a simple route to such results. The fast diffusion 



and porous media equations also provide examples of strictly convex functionals for which the 
proximal map is strictly contracting in A,- but not in W2. 

1.4 Remark. While Ambrosio, Gigli, and Savare do not explicitly consider monotonicity results 
for modifications of the squared distance along the discrete gradient flow, such a result (for a 
different modiflcation) can be found by reading between the lines in p', Lemma 4.2.4]. Consider 
the alternative modiflcation to the squared distance function deflned by 



A,(m, u) := Wiifi, u) + TE{fi) + tE{i^) . 

If one takes the flnal inequality on [21 page 92] for A = and n 
symmetrizes in /z and i', one obtains (1.21) with A^ in place of A^ 



(1.23) 

1, rearranges terms, and 
A key difference between 
A,- and our functional A,- is that, for measures fi and u with \VwE{ij,)\ and \VwE{n)\ < 00, A^ 
involves only an O(t^) correction to IVf while A,- involves an 0{t) correction to Wf (a*' 
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1.5 Remark. While one might first suppose that A,- could only be used to study discrete gradient 
flows with initial data satisfying \VwE{iJ,)\,\VwE{fi)\ < oo, when E is strictly convex, the 



discrete gradient flow produces this regularity in one step (see Lemma 2.2). We shall see an 



example of this in Section 4.2 when we apply Theorem |1.3| to the discrete gradient flow for the 
Renyi entropies. 



For A > 0, one can extract from (1.22) a useful inequality that implies, among other things, an 
optimal exponential rate of decrease of At-(/x, fl) when E has a minimizer fl (necessarily unique due 
to the strict convexity). 

1.6 COROLLARY (The case A > 0). Consider A > and r > sufficiently small so that rA < 1. 
Then for all E satisfying the hypotheses of Theorem 1.3 and /i, G D{E), 

(l + rA)A,(/x„i/,) < {l-T\)K{ii,v)+2,\TKll\p,u)[W2{li,lir) + W2{v,yr)] ■ (1-24) 

We give the proof of this corollary in Section [3j However, to explain its consequences, we state 
and prove a simple discrete Gronwall type inequality. It is a discrete version of the continuous time 
inequality [21 Lemma 4.1.8]. (See [3J and [9j for related discrete Gronwall inequalities.) 

1.7 LEMMA (A discrete Gronwall type inequality). Let A, r > 0, and let {an} and he two 
sequences of non-negative numbers such that for all n > 0, 

(1 + rA)a„ < (1 - TA)a„_i + ra^.^ft^ . (1.25) 

Then, 



all- < (1 + Ar)-"af + ^^(1 + Ar) 



1/2 



Consider the discrete gradient flow of E starting from ^ G D{E) with r > and rA < 1. Let 
fiQ := fj, and inductively define {fj-n} by repeated application of the proximal map. Define {fn} in 



the same way, starting from u G D{E). Now, apply Lemma 1.7 and Corollary 1.6 to these discrete 
gradient flows of E, taking 



an := A^(/i.„,f„) and bn := 3Xy 2W^{fln-l, IJ^n) + '2W^{Vn-l,Un) ■ 

Since 

VF|(/x, fir) < 2r[E(/i) - E{fir)] , (1.26) 

Ylk=i ^1 bounded by a telescoping sum: ^^^=1 ^ r36A^[(-E(/i) - E{^n)) + {E{v) - E{vn))] ■ In 
case E is bounded below, we may assume without loss of generality that E is non-negative. Then, 

Ay2(^„,z.„) < (1 + Ar)-Ay2(^,^) + Ar ^^^+^^V -gM+-g(^) • (1-27) 

v 2A 



1/2 

Thus, for positive A and sufficiently small r. A/ (/in,t'n) decays "exponentially fast" at rate A up 
to the time that this quantity becomes ©(r)}^ 



The proof of Lemma 1.7 is elementary, so we provide it here, closing this section. 



At this point, we may use the bound £(^i„) < (1 + A r)~^" [1 Theorem 3.1.6] and apply (|l.27| iteratively. 
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Proof of Lemma \1.1\ Multiply both sides of (1.25) by (1 + rA)^" ^ to obtain 

(1 + rA)2-a„ < (1 - (rA)2)(l - rXf^-^a^-i + r ((1 + rXf^'-^n-if'^ (1 + rA)"6„ . 
Defining 

a„ := (1 + rA)^"a„ and := r(l + rA)"6„ , 

n 

1/2'^ X — ^'^1/2 — 

we have a„ < a„_i + a^_ibn, and therefore < oq + / ^ CLjJ_ibk- Defining 

fe=i 

Cn ■= maxjofc : < A; < n} , 

n n 

we have c„ < oq + 6^. This quadratic inequality implies that < + 6^. By the 

k=l k=l 

Cauchy-Schwarz inequality, and the fact that for a := (1 + Ar)^ > 1, Ylik=i '^^ — ^j^rc*^") 

□ 

2 Generalized Convexity and the Proximal Map 

2.1 Generalized Geodesies 

In a Hilbert space, xi— )• — y|p is 1-convex along geodesies. However, the same is not true for 
the squared 2-Wasserstein distance when the dimension of the underlying space is greater than or 
equal to 2 [21 Example 9.1.5]. Instead, Ambrosio, Gigli, and Savare observe that fi i— )• ^VFf (/U, u) is 
convex along a different set of curves, which we now describe. 

Fix fii,fi2,H3 G 'Pfioi^'^) with optimal plans 2 ^ ^oifJ-i, fJ'2) , /^i,3 ^ ro(/ii,/X3)- For 
1 ^ ^ < i ^ 3, let TTij be the projection onto the ith. and jth components of M'^ x M"^ x M.'^. Fix 
fi e T'{R'^ xR'^ X R'^) so that 7ri,2#/x = fJ-i 2 and 7ri,3#/x = /x^ 3 [2, Lemma 5.3.2]. (We use bold 
font to distinguish probability measures on M'^ x M'^ x R'^ or M*^ x M'^ from probability measures on 
R'^.) As in [21 Definition 9.2.2], a generalized geodesic joining /i2 to fi^ with base fii is a curve of 
the form 

In the case /Ui e ^3(1^'^) and /i2,/U3 S '^2(1^'^), this reduces to 

: [0, 1] ^ P(M'^), ^ _ ^)^M2 + ^t^3) ^^^^ 

Ambrosio, Gigli, and Savare demonstrate that n 1— )• ^W2{^J', fJ^i) is 1-convex along any generalized 
geodesic /x^"''^ with base m, for all /x^,/^^ G T'^o(M'^) f2, Lemma 9.2.1]. Note that if the base /xi 
equals either /i2 or /i3, /x^"^^ is a (standard) geodesic joining /i2 and /i3. Thus, while ;U — )• jW^I (/^i Mi) 
is not convex along geodesies (in the sense that it is not convex along all geodesies), it is convex 
along some geodesies. 
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2.2 Functionals E : V^,(^'^) ^ M U {00} 

Fix a Borel probability measure //q- We consider functionals E : T'^p(M"') — t- M U {00} that satisfy 
the following conditions: 

• proper: D{E) := {// G 'Pf,o{R'^) : E{fi) < 00} / 

• coerczuel^ There exists r* > such that for all < r < r*, /i G Vf^g{R^), 

Er{iJL)= inf \^Wi{iJL,v) + E{u)\ > -00. 



As noted in [21 Lemma 2.2.1], by a triangle inequality argument, it is enough to check that 
there exists tq > such that 

ErM = inf ^ [^Wi{iio,y) + E{v) \ > -00. (2.1) 

lower semicontinuous: For all ^nj/^ G -f^ol^*^) such that /i„ — )■ ^ in 1^2; 

liminf £;(^„) > E{n). 

n— >oo 

X-convex along generalized geodesies: For any /xi,/i2,^3 G ^moI^'^)' there exists a generalized 
geodesic /i^"^^ from ^2 to /is with base fii such that for all a G [0, 1], 

^(/u^^^) < (1 - a)E{i^2) + a^(/i3) - a(l - a)^ ^ - x^l^dfiix). (2.2) 

Note that, for A > 0, this condition is stronger than requiring that E{fj.^^), considered as a 
real- valued function of a G [0, 1], be XW2{fJ-2, IJ-s) convex, since 



j \x2 - xsi'^dfj, >W2{^i2,^^3)■ 



l^ E is A-convex along generalized geodesies, then in particular it is X-convex: for any 111,^2 £ 



V^^^{W^), there exists a geodesic n]^"^ from /xi to ^2 such that for all a G [0, 1] 



E{^i]^^) < (1 - a)E{^xi) + aE(/X2) - a(l - a)^T^|(/ii, //a). 

This is equivalent to E{n]^'^), considered as a real-valued function of a G [0, 1], being AW|(/ii, 1x2) 
convex f2| Remark 9.1.2]. 

The requirement that a functional E : V^^ — )■ MU{oo} be proper, coercive, lower semicontinuous, 
and convex along generalized geodesies is the natural analogue of the Hilbertian requirement that 



*In the case /io = ^o, the Dirac mass at the origin, this is equivalent to the definition of coercivity in [2 , where 
Ambrosio, Gigli, and Savare require that there exist some r, > and /i* G p2(IR'') such that 



inf \—Wi{pi„v) + E{u)\>-<x^. 
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a functional E : Ti ^ M.U {00} be proper, lower semicontinuous, and convex. The two differences 
are the addition of the coercivity assumption and the strengthening of the convexity assumption. 
In a Hilbert space 7i, all functionals that are proper, lower semicontinuous, and convex are also 
coercive (in this sense) , so the addition of the coercivity assumption is a natural way to ensure that 
the 2-Wasserstein Moreau-Yosida regularization is not identically —00. The convexity assumption 
is strengthened because convexity along generalized geodesies is the useful 2-Wasserstein analogue 
of Hilbertian convexity. While in a Hilbert space, xi— t-^H^; — y|pis 1-convex along all geodesies, 
the same does not hold for the 2-Wasserstein metric. Requiring convexity of the functional on a 
larger class of curves compensates for the weaker convexity Tyf. 



2.3 Further Results About the Proximal Map 



In the following theorem, we collect some key results from [2, Theorem 4.1.2, Corollary 4.1.3] 
regarding the proximal map. 

2.1 THEOREM. Given E : V^^i^'^) — )■ M U {00} proper, coercive, lower semicontinuous, and A- 
convex along generalized geodesies, fix t > small enough so that tX > —1. Then, for fi G D{E), 
the proximal map 

is well-defined. Furthermore, the following variational inequality holds: 

1 



2t 



W^{fi,fir), yiyeD{E). (2.3) 



When the proximal map is well-defined, it satisfies an Euler-Lagrange equation — a fact originally 
observed by Otto in |18|. 119]. We state this result in the framework of [2, Lemma 10.1.2]. 

2.2 LEMMA. Given E : 'Pf,o{R'^) MU{oo} proper, coercive, lower semicontinuous, and X-convex 
along generalized geodesies, fix t > small enough so that tX > —1. Assume that /i G D(E) so 



/i I— )• /i,- is well-defined by Theorem 2.1. Then, 

T\VwE{^lr)\ < W2{fl,IIr)- 



(2.4) 



We may interpret | Vpi/i?(/iT-)| as the metric slope when E and /x lack sufficient smoothness 

for the norm of the 2-Wasserstein gradient (l.!^ to be well-defined. 

On the other hand, if fi ^ 7^2(1^'^) ^'^^ ^^^f^ ^ '^^^ Mr CLre smooth enough so that the 2- 
Wasserstein gradient V\YE{^r) is well-defined by (1.8), then 



id + rV— Gu. 
bp 



Pr -almost everywhere and 



T\VwE{lJLr)\ = VF2(Ai,/ir). 



(2.5) 



(2.6) 



Proof. (2.4) follows from [21 Theorem 3.1.6]. 



(2.5) follows from [21 Lemma 10.1.2] and the fact that, when E is differentiable, V^^fir) is the 



unique element of its subdifferential at /Xt-. 

(2.6) follows from (2.5) by considering the L'^{pr) norm of t^^ — id = tV^{i1j 



□ 
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3 Proofs of Theorems \1.2[ and |1.3| and Corollary |1.6 



We now prove the theorems and corollaries announced in the introduction, turning first to the 
generalized convexity of Er- In a Hilbert space, if E is proper, lower semicontinuous, and convex, 
then its Moreau-Yosida regularization Er is also convex. It is well known that the exact analogue 
in the 2-Wasserstein metric is false. For lack of a reference, we provide the following example. 
Fix fiQ G 7^2 (K"') and define E : P2(M'^) ^ M U {00} by 



Eifi) 



if ^ = ^0 
00 otherwise. 



(3.1) 



E is proper, coercive, lower semicontinuous, and convex along all curves in V2{^'^)- In particular, 
E is convex along generalized geodesies. By definition. 



ErifJ.) 



inf 

v&r2 



2r 



Wi{iJi,v)+E{u) 



1 

27 



By [21 Example 9.1.5], when the dimension of the underlying space satisfies d > 2, Ej- is not 
A-convex along geodesies for any A G M. 

As demonstrated by the previous example, the convexity of Er is related to the convexity of 
the squared 2-Wasserstein distance. This also holds in the Hilbertian case, where the convexity of 
Et is a consequence of the 1-convexity of x 1— )• ^\\x — |17j . Therefore, it is natural that our 
proof of the convexity inequality for Ej- requires the following convexity inequality for 

3.1 LEMMA (Convexity inequality for Fix three measures ^i,/i2,A'3 G 7^(M'^) that are a 

finite 2-Wasserstein distance apart. Let fJ-l^^ be a generalized geodesic from fii to with base 
point fi2, 



^i^^ := ((1 - + a7r3)#/x , 



where fj, £ V{1 



satisfies n^ o ■= 7ri,2#/^ e ro(/ii, /U2) and 3 := vr2,3#/x G ro(Ai2, Ms)- 



Let fJ-a ^6 the geodesic from /ii to ^2 defined by 



Then, 



Wiit^lr^ isl-^^) < il-a)Wi{fiul^i) + aWi{i^2,f^3)-ail-a)Wiifi2,t^3). 



(3.2) 



Proof. Note that 



((1 - a)TTl + aTT2) #/^l,2 = ((1 - ")7ri + aiT2) 



Then by [21 Equation 7.1.6], 



a ' Ma 



< 



a 



a 



[1 — a)TTi + aTTs] — [(1 — a)TTi + a7r2]| d/x 



|7r2 - TTsI dfl 



K2 - vTal dfi2' 



a^W^ifl2,^^3) 

(1 - Q!)VPf (/xi,;Ui) + aWf (;U2,M3) - a(l - a)W|(/x2, Ms)- 
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□ 

We now use this convexity inequality for W2 to prove Theorem [lT] 
Proof of Theorem \l . 1\ Since E is proper, coercive, lower semicontinuous, and A-convex along gen- 



eralized geodesies for A > 0, by Theorem 2.1 the proximal map // 1— >• is well-defined for G D{E) 
and r > 0. Let fia'^'^^ be the generalized geodesic from p, to with base point // on which E 



satisfies equation (2.2). Defining ^1 := /x, fi2 '■= A*, and /X3 := /Ur, let ji^^ be the geodesic from n 



to /Li described in Lemma 3.1 By Lemma 3.1 



W|{^iV,^x^^^) < (1 - a)VF|(M, A) + - a(l - 

This allows us to bound E^-i^a'^'^) from above: 

EAf^rn = inf ^ ^^^v|(/«r^^)+i^(^)| 

'^eT'p„(]R<*) t2r J 

< _L ((1 _ a)VF|(/i, A) + aVF|(^, fir) - a(l - a)T^|(M, ^r)) 
+ (1 - a)E{fL) + aS(/i^) - a(l - a)^VF|(/2, ^r) 

< (1 - a)Er{fi) + a^,(^) - a(l - a) (^^Ka^, /^r) + ^WUfi, /x,) 

In the last step, we used that {fL)r = /i, since E attains its minimum at fi. Now, we apply 

2 , 0,2 ^ «/5 / , u\2 



aa^ + I5h^ > — —-{a + hy , for a > 0,/3 > 
a + p 



with a = 1/t and /3 = A: 



^r(;"r^) < (1-a) +a(^-^W^{fi,fir) + E{fir 

- a(l - a)^ (Ty2(/i, f^r) + VF2(/i, ^r))' 

1 TT^2/-. , i7/7-.\\ , „Y 1 . ,. X , 771/,, \\ „Yi „\-^^w2/ 



< (1 - a) ( ^VF2^(A, /^) + i5^(A)J + « (^^^2 /"O + ^(/^OJ - «(1 - «) yW^2 (/^, /^)- 
Finally, since E attains its minimum at /i, {p,)r = fi- Therefore, 

EripPrn <(!-«) + Emr?i + « ^r) + " «(1 " «) yW^K/^, Z^) 

= (1 - a)^^(/i) + aS^(^) - a(l - a)y 

□ 



We now use this convexity inequality to prove Theorem 



1.2 
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Proof of Theorem 1.2. We first prove the Talagrand inequality. Since E attains its minimum at /i, 



so does Et-. Therefore, (1.16) implies that, for all G D{E), 



Eriil) < Er{fiP,-^n < {I - a)Er{fL) + aErifi) - a{l - a)^W^{fL, fi) . 



Rearranging gives 



Thus, for ah a £ (0,1), 



a{l-a)^W^{fl,fi) <a{Erifi)-Er{fl)) 



(1 _ a)^Wi{fl, /x) < Erifi) - Eriji) . 



Sending a — )• gives the Talagrand inequality (1.17). 



We now prove the HWI inequality. Again by (1.16), for all /i G D{E) 



EriliV) < (1 - a)E^{li) + aEriii) - a(l - a)^H^|(^,/x) . 
Rearranging and using ^q^^ = ^iZcT ^^"^ ~ o)^2(a', ju) = VF2(/u, /ij*!^^) gives, for a G (0, 1), 
(1 - a)Er{^i) - (1 - a)Erm < E^ifi) - Eri^i^^j^) - a(l - 



Erifi) - Erifl) < 
Erifl) - Erifl) < 



a—W2{n,fi) 

1 — a 2 

— — — W2{fi, /i) - a—W^ ifi, /i) 



Sending a — >• 1 gives the HWI Inequality (1.18) 



□ 



Finally, we turn to the proof of Theorem 1.3 



Proof of Theorem By Theorem |2.1[ replacing v with i/,-, 
1 



Similarly, 

^ (T^|(z^.,/i) - T^K^,^)) + < E{^l) - E{ur) - l:^W^{v,Vr). 

Adding these and multiplying by 2r gives 

Wii^ir, vr) - Wiiu, fi) + Ar [W^K/x,, ly^) + W^{fi, Ur)] < 2r [E{fi) - E{f,r)] - fi^) - W^{u, 

Symmetrically, we also have 



2 V^' 
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Averaging gives 

< T [E{u) - E{ur) + E{fi) - E{fir)] - Wiifi, fir) " Wi{v, Vr). 
This allows us to bound the change in ^rif^-, ^) from above: 



2 2 

-VF|(/i,z.) - :^|Vw.i^(;u)|2 - ^-\VwE{v)\'' 

< T [E{U) - E{Vr) + E{ll) - E{fir)] - Wiifl, fir) " W^U, Ur) 
r-'i ^2 ^2 ^2 



r 
Ar 



By [21 Equation 10.1.7, Lemma 10.1.5] and Holder's inequality, the A-convexity of E implies 

E{v)-E(yr) < \VwE{u)\W2{iy,Ur) - \wl(v,vr) . 
Combining this with the Euler-Lagrange equation (|2.4l), 



(3.3) 



Arifir, Vr) - Ar(//, v) < t\V w E{u)\W2iu, Z^r) + T | V^y ^(/i) | W2 (/i, fir) - fir) - W^{u, Ur) 

+ ^Wiifi,fir) + lwi{u,ur) - ^\VwEifi)\^ - ^\VwE{u)\'' 

[2Wi{fir, Vr) + Wiifi, Vr) + WHy, fir)\ " ^ ^liv, Ur) + fir) 

Completing the square gives the result: 

Kifir, Vr) - Ar{fi, v) < -^{t\VwE{v)\ - W2{u, ^r))^ " ^{T\VwE{fi)\ - W2{fi, fir))^ 



Ar 



[2W^{fir, Vr) + Wl{fi, Vr) + Wl{y, fir) + W^U, Ur) + W^{fi, fir) 



□ 



Proof of Corollary 1.6. First, we use A > and the Euler-Lagrange equation (2.4) to rewrite (1.22): 

Kipir, Vr) - Ar(Ai, v) < -^{t\VwE{u)\ - W2{v, Ur))^ - ^{T\VwE{fi)\ - W2{fi, fir)? 



Ar 



[2Wi{fir, Ur) + Wiifi, Ur) + W^U, fir) + AVwE{Ur)\'' + A^wEifir) 



-]^{r\VwE{u)\-W2{u,Ur))'' - \{T\VwE{fi)\-W2{fi,fir))^ 

Ar 



[2K{fir, Vr) + Wiifi, Ur) + W^{u, fir)] . 



Rearranging terms, we have 



(1 + Ar)A,(^,,z.O < Kif^.v) - ]^{t\VwE{u)\ - W2(v,Vr)? - \{T\VwE{fi)\ - W2{fl,flr)? 



Ar 



-^^l{l^.Vr)^Wl{u,fXr)\ . 



(3.4) 
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By the triangle inequality, 

and we have a similar bound for W|(^t-, v). 
Finally, for At < 1, 

1 /r^ 

-{T\VwE{^i)\-W2{^l,^lr)? > Xt i—\VwE{fl)\'' -T\VwE{l^)\W2{fir,f^) 



and again we have the same inequality with fj, in place of u. Using these inequalities in (3.4) we 
obtain the desired bound. □ 



4 Examples and Applications 

4.1 Inequalities ( |1.16D and ( |1.22D are Sharp 

Our first example shows that the inequality (1.16) from Theorem 1 1 . 1 1 and the inequality (1.22) from 



Theorem 1.3 are both sharp. For A G M, consider the functional E : "PK 

Ax2 



atjn>d\ 



defined by 



-d/x . 



(4.1) 



As shown in PI Example 9.3.1], E is proper, coercive, lower semicontinuous, and A-convex along 
generalized geodesies. 

4.1 PROPOSITION. For E given by (O), X > 0, and t > 0, define Xr := Then Er is 

Xr-convex, and no more. 



4.2 PROPOSITION. For E given by (4.1), e D{E), and t > small enough so that 



At > —1, there is equality in (1.22) 



We first prove the following lemma. For E given by (4.1), it is well-known that the proximal 
map is simply a scale transformation: 



4.3 LEMMA. For E given by (4-1), G D{E), and r > small enough so that Ar > —1, the 

(4.2) 



proximal map associated to E is the scale transformation 

^ (1 + Ar)~Md#/i 
where \d{x) = x is the identity transformation. Moreover, for any ^ D{E), 



1 



:i + Xrf 



and 



1 



1 + At 



W2'{fi,iy) + 2T(E{fi) 



1 



1 + At 



E{i.) 



(4.3) 
(4.4) 
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Proof. At any // G D{E), 



5E , , Xx"^ , 
V-(^) = V— =Ax. 



(4.5) 



For r > small enough so that Ar > —1, the Euler-Lagrange equation (2.5) becomes 

= x + Xtx = (1 + Xt)x, 



/i,-- almost everywhere. This shows (4.2): 

(1 + Ar)-Md#/i = Hr . 

Next, fix (/) : M"^ — )• M convex and define v := Vcpi^fi. By uniqueness in the Brenier-McCann 
theorem, is the optimal transport map from /i to v. If ip is defined by 

V'(x) = (1 + At)-20((1 + At)x) , 

■0 is convex and Vip^fir = i^t- Again, by uniqueness in the Brenier-McCann Theorem, 'Vip is the 
optimal transport map between fi^- and Ur- Consequently, 



I V0(2;) — x|^d/ir 
(1 + At)-2 f ^ _ (1 ^ Ar)x|2d/x, 

(1 + At)-2 / \V(l){x)-x\'^dfi 



il + XT)-^Wiifi,u) . 



This proves (4.3). 



Finally, note that if (j) is convex and V(j)i^fj. = u, by the definition of i^) and of E, 

2 [ X- V</.(x)d/i = ^{E{fi) + ii;(zv)) - Wiifi, v) . (4.6) 

Using that 



(1 + AT)-lV0#/i = Vr , 



we may argue as above to show 



1(1 + Ar)-^V(/.(x) -xl^d/x 

i 

1 + Xt)-^E{v) + ^E(/x) - 2(1 + Xt)-^ j X ■ V(/)(x)d^ . 



Combining this with (4.6) proves (4.4). 



□ 



Proof of Proposition \4-l\ We first explicitly compute the Moreau-Yosida regularization of E. It 

(4.7) 



follows from (4.2) and the definition of E that for all /i G D{E) and < r < oo, 

w|{^l,^Jir) = 2XT^E{|Ji,) . 
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Again by ( [42] ), 
Hence, 



1 



Eifir) = (1 + Ar)-2^(/i) . 



WiifX, fir) + E{flr) = (1 + XT)E{fl, 



1 



-Eil,) 



Thus, the Moreau-Yosida regularization of E in this (aheady very regular) case simply multiplies 
E hy a constant. 

It is a standard result (see e.g. [2]) that E is A-convex, and no more. (Its Hessian with 
respect to the W2 Riemannain metric is A times the identity.) It then follows immediately from 



Erifl) 



1 



1 + At 



E{iJ,) that E-r is no more than Ar-convex. 



□ 



Proof of Proposition 4-2: We proceed by using Lemma 4.3 to express quantities appearing on either 
side of (1.22) in terms of M/^l (/^i '^)> ^if^) ^i'^)- By the symmetry of fi and z^, equations (4.3) 



and (4.4) allow us to express W2 {fir , i^r) , Wl{ii,Vr) and H2(^)^r) in these terms. By (2.6), (4.5), 



(4.7), and (4.8) 



r2|Vvi/^(/u)|^ = j i^xfdfi = 2\T^E{p.) and T'^\VwE{nr)\'^ = VF'|(/x,/i^) = 2Ar2^(^)/(l+Ar)^ 



Symmetric identities hold with ly in place of fi. 



Finally, direct calculation shows that both sides of (1.22) are equal to 

2Ar + AV^ 



(1 + Ar)2 



[W^{l^,u) + XT\E{fi) + E{u))] 



□ 



As we see from (4.3), the proximal map for E is always contracting in the W2 metric for A > 0. 



Thus, in this example, the additional terms in A,- are not required to produce contraction. The 



point of this example is rather to show that (1.16) and (1.22) are sharp. 



4.2 The Discrete Gradient Flow for Entropy and Renyi Entropies 

In our second example, we consider functionals Ep corresponding to the entropy and Renyi entropies. 



We apply Theorem 1.3 to obtain a sharp bound, uniformly in the steps of the discrete gradient flow 
sequence, on the rate at which rescaled solutions of the discrete gradient flow converge to certain 
limiting densities, known as Barenblatt densities. This result mirrors a well-known result obtained 
by Otto for the corresponding continuous gradient flow. In carrying out this analysis, we learn that 
the discrete gradient flow is surprisingly well-behaved, not only on average, but also uniformly in 
the steps. We also show that Otto's beautiful sharp results for the continuous gradient flow can be 
obtained very efficiently from the analysis of the discrete flow. 

First, we define the functionals to be considered. For p > 1 — l/dj^ define Up : M+ — M by 



sP- 



Upis) :-- 



ifp^l 
s log s ii p = 1. 



p-i 



The borderline case p — 1 — 1/dis more involved, and, for the sake of simplicity, we do not consider it in this 
paper. It may be possible to extend our approach to this case using the regularization techniques developed in 
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Let V2{^'^) be the set of probabihty measures with finite second moment that are absolutely 
continuous with respect to the Lebesgue measure. Define the functional Ep : V{M.'^) — )■ MU {00} by 



J^, Up{f{x))dx iifiG (M^), (i^(x) = fix)dx 
00 otherwise. 



For p = 1, £'p is minus the entropy. For p ^ 1, Ep is minus the Renyi entropy. As shown in 
[21 Example 9.3.6], Ep is proper, lower semicontinuous, and convex along generalized geodesies. As 
for coercivity, for p > 1, Ep is bounded below by —l/{p— 1), hence coercive. For 1 — 1/d < p < 1, 
Ep is not bounded below, since f^d f^{x)dx can be arbitrarily large. Ei is neither bounded above 
nor below. Nevertheless, Ep is coercive for p > 1 — 1/d, when d > 2, and for p > 1/3, when d = 1. 
Later, we shall need some of the estimates that imply this, so we now explain this case. The p = 1 
case can be found in [10] . 

By Holder's inequality, with exponents 1/p and 1/(1 —p), for all u G "Pf (^^'^) with du = f{x)dx, 

I fP{x)dx= I /P(x)(l + |a;p)P(l + |x|2)-Pdx 

<( [ /(x)(l + \x\'^)dxY ( [ (1 + |xp)-P/(i-P)dx^ 

KJR'i J yjRd J 



1-p 



l-p 



Furthermore, f^^ f{x)\x\'^dx = \x\'^di^ = W|(z^, 60), where 60 is the Dirac mass at the origin. 
By the triangle inequality, for any jj, S T'KM'^), 

W2{u,5o) <W2ifi,iy) + W2{f^,5o) , 

so that 

[ F{x)dx<([ {l + \x\^)-P/^'-P^dxy \l + {W2(,f^,iy) + W2{fi,5o)fy . 

JR'' XJR'i J 

Finally, defining 

Cp:=J—([ {l + \x\')-P/('-P^dx\ 
we have for all e (^^"')) 

Ep{u) >-Cp (^1 + 2 J ^ |x|2d/i + 2Wi{fi, u)^ . 
Thus, for all 12, u £ V^{R'^), 

^H^|(/i,z.) + i?p(z.) > ^Ty|(;u,i/)-Cp(^l + 2^Jx|2d;U + 2H^|(/x,zv)y . (4.10) 

For fixed ^, the right hand side is bounded below for all r > and u G P2(M.'^), hence Ep is coercive. 

Note that the condition p > 1 — 1/d, when d > 2, and p > 1/3, when d = 1, is exactly the 
condition to ensure Cp is finite, and it is easy to see that coercivity fails when this is not the case. 
For a more general result, see [2^ Remark 9.3.7]. 

From this analysis, we may also extract an upper bound on W2 (a*) /^r) which will be useful later. 



(4.9) 
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4.4 LEMMA (Distance bound for the proximal map). If d>2, fix p > 1 — 1/d, and if d = 1, fix 
p > 1/3. Let fi £ D{Ep) CLud 

M{ii) := 1 + 2 / \x\'^dii . 
Then for all r small enough that ApCpT < 1, 

A similar, but more complicated, bound in terms of the same quantities holds for all t > 0. 



Proof. By the definition of the proximal map, taking u = fi in the variational problem (1.7), we 
obtain ^ 

Epifl) > —Wiifl, fir) + Epiflr) . 



Then, by (4.10) with u = fir and Bernoulli's inequality, (1 + m)^ < 1 + pu, 

Epifl) > ^W^{fi,f,r)-Cp{M{fl) + 2Wi{fl,flr)Y 



2r 
1 

27 



Wi{fi,fir)-CpMP{p) 



> 



> 



l-Wi{lS,f,r)-CpMP{p)(l 



( 2Wiifl,flr) 

V Mif,) 

2W^{fi,fir 



+ p- 



M{fi) 



2t 



2pCp 



W^{fi,fir)-CpMil2) 



In the last line, we used that M{fi) > 1. 

The bound is simple due to the use of Bernoulli's inequality (1 + uY < 1 + pu. Avoiding this, 
one obtains a bound without restriction on r. Since we are mostly concerned with small r, we leave 
the details to the reader. 

□ 

If d > 2, fix p > 1 — 1/d, and if d = 1, fix p > 1/3. Then, Ep is proper, coercive, lower 
semicontinuous, and convex along generalized geodesies. Therefore, Theorem 2.1 guarantees that 
the proximal map and discrete gradient flow (1.10) are well-defined for < r < oo, fiQ £ D{Ep). 
Before turning to the long-time asymptotics of the discrete gradient flow for Ep, we first investigate 
the contraction properties of Ar{fi, f) under the proximal map. 

Unlike the functional considered in Section 4.1 Ep is translation invariant. Specifically, for fixed 
rco G M'^, if is the translation given by 

T^^fi := (id - xo)#/i , 

then Ep{Txofi) = Ep{fj,). The 2-Wasserstein distance is also translation invariant: for any /i, S 
Consequently, the proximal map associated to Ep commutes with translations: 
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On one hand, this imphes that the proximal map does not contract strictly in VFf • 
V G Pf(M'^), Wl{v,T.^^v) = xl so 

On the other hand, because the functional Ep is strictly convex [21 [T2], strict inequality holds in 



(3.3) and hence in (1.21) of Theorem 1.3 



Therefore, AT-(/i, v) is strictly decreasing under the proximal map, even though (/^i ^) is not. 

We now turn to the long-time asymptotics of the discrete gradient flow for Ep. As shown by 
Otto [TU], the r — >• limit of the discrete gradient flow tends to the continuous gradient flow on 
7^2 (J^*^)) which corresponds to the porous medium equation or the fast diffusion equation: 



d_ 
di 



p{t,x) = Ap{t,x)P . 



(4.11) 



(For p < 1 this is the fast diffusion equation. For p > 1, it is the porous medium equation.) We 
show that for each r > 0, the discrete flow is a strikingly close analogue of the continuous flow. 
A key feature of (4.11 ) is that it has self-similar scaling solutions known as Barenblatt solutions, 

X 



ap{t,x) :=t-^^/ip(^) 



where 



and 



(3:-- 



2 + d(p - 1) ' 



hp{x) :-- 



(A + i^f|x|2)i/(P-i) ifi 



d 

if p = 1 



(A + 



l-P/9|^|2a/(p-l) 



(4.12) 
(4.13) 

(4.14) 



if p > 1, 



with normalizing constants A = X{d,p) so that J^d dap{x) = hp{x)dx = 1. 

4.5 DEFINITION (Barenblatt density). If is a probability measure of the form dp = crp{t, x)dx, 
we call /i a Barenblatt density. Going forward, we will simply write p = ap{t, x)dx. 

We now show that the Barenblatt densities are preserved under the discrete gradient flow. 
Before stating the next proposition, let us observe that < /3 < 1 for all values of p > 1 — 1/d. 
Thus, the function s i— >• — tI5s^~^ is strictly monotone increasing for s > and yields the value 
for s = r/3. Consequently, for any r > 0, there is a unique s > t(3 such that 



./3 



Tf3s' 



13-1 



(4.15) 



4.6 DEFINITION (Proximal time-shift function). Define the proximal time-shift function 
Or : M+ —7- so that, for any r > 0, 9T-{r) is the unique value of s that solves (4.15). 
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We have already observed that Orir) > t/3 for all r > 0. Since — rfir^'^ < for all r > 0, 
Or{r) > r. The following lemma generalizes a result in [7j for the p = 1 case, showing that the 
proximal map for the functional Ep takes ap{r,x)dx to ap{9r{r),x)dx. Thus the proximal map 
takes a Barenblatt density to a Barenblatt density with a larger "time parameter" . Given that the 
class of Barenblatt densities is preserved at the discrete level, we would of course expect the time 
parameter to increase. 

4.7 PROPOSITION. If d > 2, fix p > 1 - l/d, and ifd=l,fixp> 1/3. Let ^ he a Barenblatt 
density, i.e. /x = ap{r,x)dx for some r > 0. Then, for r > 0, the image of /x under the proximal 
map for Ep is of the form 

= o'p{6r{r), x)dx . (4.16) 

Proof. Given a Barenblatt density /x = ap{r, x)dx for some r > 0, let s := Orir) and u := cTp(s, x)dx. 
We compute 

5E fix 
V—^{u) = Up {ap{s , x))V ap{s , x) = pap{s,xy~'^Vap{s,x){x) = z^-almost everywhere, 

(4.17) 



Next, note that since s = Brir) > Tf3, 

V^(x) := x + tV^(i/) = ( 1 

dp \ s 

is the gradient of a convex function. Consequently, if we define 

p := V^#i^ , (4.18) 
uniqueness in the Brenier-McCann Theorem guarantees that Vip is the optimal tr ansp ort map 



6E 

between z/ and p. Since V^p = t^J = id + rV-— ^(i/) is the the Euler-Lagrange equation (|2.5|), u = pr, 

bp 



the image of p under the proximal map. With the explicit form of V99 and ap{s,x), we compute 




By definition of s = Orir) 



S 



Therefore, p = ap{r, x)dx = fi, so fir = Pt = 1^ = crp{s, x)dx = ap{9r{r), x)dx. □ 
Note that when r is very small compared to t > 0, and hence also compared to s := Orit), 

t=[l--) s^s-- = s-r, 

so 9r{t) K. t + T. Thus, in this approximation, the proximal map shifts the time forward by r, 
independent of t. To the extent this is accurate, it makes it very easy to understand the discrete 
gradient fiow for Ep starting from a Barenblatt density: at the nth step of size r, one gets a 
Barenblatt density whose time parameter has been increased by approximately nr. The following 
lemma allows us to control this approximation in precise terms. 
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4.8 LEMMA. Fix r > 0. Then, for all t > r, 

r < er{t)-t < T (4.19) 

Proof. Let s := 0r(i) for any t > r. We recall that < /? < 1 for all p > 1 — 1/d. By the definition 
of 6r, we have 

Assume s > t + t. Then, by Bernoulli's inequality (1 + u)^^^ < (1 + (1 — P)u) with u := r/t, 
= / - r/3/-i > (t + r)^ - T/3(t + r)/^"! = (t + rf-^t + (1 - /3)r) = + uf-\l + (1 - 



This is a contradiction. Therefore, ^r(i) = s < t + t, which proves the upper bound in (4.19). 
To obtain the lower bound, we use the upper bound on s and the relation s = t{l — r/3/s)~^^^ 

to obtain s > t f 1 — -^—^ — j . Then since (1 + u)^^^^ > 1 — u/P and t > r, 

s>t( 1 + 4^^ I >t + T 



Pt + Tj ~ \r + T 



□ 



We may now use Theorem 1.3 to control the rate at which rescaled solutions to the discrete 
gradient flow converge to a Barenblatt density. First, we define the rescaled discrete gradient flow. 
For any positive integer n, let 0" be the n-fold power of 9r- For t > 0, let St denote the scaling 
transformation given by 

Stf = • 

Since t~^x is the gradient of a convex function, uniqueness in the Brenier-McCann Theorem implies 
that it is the optimal transport map from v to Stv. 

Let /i be a Barenblatt density, i.e., /i = (Tp(r, x)dx for some r > 0. Then = hp{x)dx. Let 
{Hn} be the discrete gradient flow with initial data /i for fixed r > 0. By Proposition |4.7[ 



J'^H = Hn = (Tp{e':^{r),x)dx , 

and by definition of the scaling transformation, 

Sgn(^j.^J!^'fi = Sgn(^j.^fin = hp{x)dx for all n G N . (4.20) 

Thus, each step of the discrete gradient flow sequence is also a rescaling of hp{x)dx. 

In fact, something almost as good holds even when the initial data of the discrete gradient flow 
is not a Barenblatt density. We apply Theorem 1.3 to prove that if {I'n} is a discrete gradient flow 
with initial data G D[Ep) for fixed r > 0, then 

lim S0ni^\J'^v = lim SQni^\Un = hp(x)dx . 

That is, if you wait a while and scale the solution to view it in a fixed length scale, what you see is 
(essentially) a Barenblatt density, no matter what the initial data ly G D(Ep) looked like. Moreover, 
we show that W2{Sgn(^r)^m hp{x)dx) essentially contracts at a precise polynomial rate. 
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4.9 THEOREM (Discrete fast diffusion and porous medium flow). If d>2, fix p > 1 — 1/ d, and 

if d = 1, fix p > 1/3. Let v G D(Ep) and let /i = ap{r,x)dx for some r > 0. Given < r < 1, 
let {z^n} and {fin} th^ discrete gradient flows with initial conditions v and fx. Define the 

rescaled discrete gradient flow sequence 

Vn '■= Sgn(^r'jl'n ■ 

Then, there is an explicitly computable constant K depending only on d, p, r, Ep{v), and 



M{u) 



1 + 2 / \x\^diy , 



so that 



Wi{Vn,hp{x)dx) < ie!^ir))'^^[W2ii^,fJ.)[W2{i^,fi) + T^/^K] + rK] . (4.21) 
From tliis, we readily recover Otto's contraction result for a continuous gradient flow as follows. 



For any t > 0, let int(f/r) denote the integer part of t/r. By Lemma 4.8 0r(i) = t + t, up to an 
error that vanishes uniformly in t as r — )• 0. Thus, a simple iteration yields 



limC*^*^^^(^) 



r + t 



(4.22) 



Interpolating and taking the limit r — )• as in PH], one obtains from {fn} a solution p{t,x) to 
d 

—p{t,x) = Ap{t,x)^ with p{0,x)dx = vq. Define the rescaled solution 

p{t, x) := (r + tff^pit, (r + t)^x) . 

We then conclude that, for alH > 0, 

Wf(p(t,x)da; , hp{x)dx) < {r + ty'^^W2ip{0, x)dx , ap{r,x)dx) . 

One may choose r to minimize Ty|(/9(0, x)d3; , ap{r, x)dx). Otto has shown this contraction result 
is sharp. Hence the "near contraction" result we obtain in the discrete setting cannot be improved 
in any manner that is uniform in r. 

Other aspects of Otto's analysis that leverage this contraction into a bound on convergence 
may be applied at the discrete level without difficulty, and we do not go into the details here. On 



the other hand, while Otto proves a continuous gradient flow analogue of Theorem 1.3, his proof 
does not extend to the discrete case. Theorem |1.3| provides the means to carry out the discrete 



analysis and to show that the discrete gradient flow analogue of (4.11) is surprisingly complete. 



Proof of Theorem 4-9. By Theorem 1.3, applied iteratively, we have 



(4.23) 



Note that we make the comparison with A^(i/^, //,-), not Kr{y-,p), since \VwEp{v)\^ (and hence 
h.T-{p,v)) may be infinite, but by [2], Theorem 3.1.6], the strict convexity of E implies 



(4.24) 



CC October 10, 2012 



26 



so At-{i^t, IJ-t) < oo. We shall show that A^(z^t-, /i,-) is very close to /x), differing by a term 

that is 0{t^^'^). Specifically, there exists a constant K depending only d, p, r, Ep{y), and M{u), 
such that 

Ar{Ur, fir) < W2{u, ^l)[W2{u. fl) + T^^^K] + tK . (4.25) 



Using this in (4.23), we obtain 



(4.26) 



Next, by the scaling properties of the 2-Wasserstein metric and (4.20), for all n > 1, 



9;'(r)) 'I'Wiivn, iin) = W2^(59n(^)i/„,50n(^)/i„) = Wi{vn,hp{x)dx) . 



Therefore, 



V2, 



which is ( |4.21 ) . 

It remains to prove (4.25). First, note that since ^ 

V— r-^(u) = — — . Thus, by Lemma 
op r 



2.2 



(Tp(r, x)dx, (4.17) implies 



and the definition of the length of the gradient (1.9) 



f 



(4.27) 



We will consider the cases p < 1, p = 1, and p > \ separately. For 1 — g < p < 1, when d > 2, 
and 1/3 < p < 1, when d = 1, we may use the bound on W2{v, Vt) provided by Lemma [4. 4| to show 

Ep{v) + CpM{v) 



(4.28) 



1 - ApCpT 

(This particular bound requires ^pCpT < 1, but one may prove a similar bound with a more 
complicated constant that holds for all r > 0.) By the triangle inequality, 

W^illr^Vr) < {W2{fJi,v)+W2{^,fJir)+W2{v,Ur)? 

< W^{fl,u) + 2W2{^l,iy)[W2{fl,flr)+W2{u,Ur)]+2W^{p,flr) + 2W^{l^,iyr) • 



Combining this with (4.28) and (4.27) gives 
Ar(/ir,i^r) < W^{n,iy) + 2W2ip,i^) I 2r 



Ep{u) + CpM{u) 



1 - ArpCp 



1/2 



+ r 



|x|^cjp(r, x)dx 



1/2 



Ep{v) + CpM{u) 5 2^ 
l-4rpC„ 2^ r2 



|x|^(7p(r, x)dx . 



This leads directly to (4.25) with an explicit constant. 

For p > 1, by Lemma 2.2 and the definition of the proximal map, 

T^\VwEp{Vr)\'^ < Wi{u,Ur) < 2T[Ep{v) - Ep{Ur)] 



Since Ep is bounded below, an analogous argument leads to (4.25). 

The case p = 1 is similar to the case p < 1; we leave the details to the reader. 



□ 
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